Goto

Collaborating Authors

 memory state


A Hierarchical Language Model with Predictable Scaling Laws and Provable Benefits of Reasoning

arXiv.org Machine Learning

We introduce a family of synthetic languages with hierarchical structure -- generated by a broadcast process on trees -- for which the role of context length and reasoning in autoregressive generation can be analyzed precisely. At the heart of our analytic approach is an \emph{exact $k$-gram ansatz} in place of transformers with context length $k$, a substitution we then validate empirically. Using this ansatz we derive explicit asymptotic predictions for distributional statistics of the sequences produced by a trained model, instantiated in two settings. For the \emph{Ising broadcast process} (a soft-constrained language), we prove that the variance of the generated sum scales log-linearly in the context depth and its kurtosis converges to that of a Gaussian -- both deviating from the true language for any sublinear context. For the \emph{coloring broadcast process} (a hard-constrained language) in the freezing regime, bounded-context autoregression produces sequences that, with high probability, are inconsistent with \emph{any} valid coloring of the underlying tree. Together these results imply an $ฮฉ(n)$ lower bound on the context length required to faithfully sample length-$n$ sequences. In contrast, we prove that an autoregressive \emph{reasoning} model with only $ฮ˜(\log n)$ working memory can sample exactly from the true language -- an exponential improvement. We confirm both the lower-bound predictions and the reasoning-based upper bound empirically with transformers trained on the synthetic language; the trained models track our asymptotic predictions quantitatively across a wide range of context sizes.



Cognitive BASIC: An In-Model Interpreted Reasoning Language for LLMs

arXiv.org Artificial Intelligence

Cognitive BASIC is a minimal, BASIC-style prompting language and in-model interpreter that structures large language model (LLM) reasoning into explicit, stepwise execution traces. Inspired by the simplicity of retro BASIC, we repurpose numbered lines and simple commands as an interpretable cognitive control layer. Modern LLMs can reliably simulate such short programs, enabling transparent multi-step reasoning inside the model. A natural-language interpreter file specifies command semantics, memory updates, and logging behavior. Our mental-model interpreter extracts declarative and procedural knowledge, detects contradictions, and produces resolutions when necessary. A comparison across three LLMs on a benchmark of knowledge extraction, conflict detection, and reasoning tasks shows that all models can execute Cognitive BASIC programs, with overall strong but not uniform performance.


c213877427b46fa96cff6c39e837ccee-AuthorFeedback.pdf

Neural Information Processing Systems

We would like to thank the reviewers for their comments. In the following, we address the points that were raised. It will be better to highlight'moving objects' in We hope this will further extend the impact of our work for the community. We will be happy to add them to the main body of the revised article. We confirm that the dataset will be released upon acceptance of the paper.


Language Modeling With Factorization Memory

arXiv.org Artificial Intelligence

We propose Factorization Memory, an efficient recurrent neural network (RNN) architecture that achieves performance comparable to Transformer models on short-context language modeling tasks while also demonstrating superior generalization in long-context scenarios. Our model builds upon Mamba-2, enabling Factorization Memory to exploit parallel computations during training while preserving constant computational and memory complexity during inference. To further optimize model efficiency and representational capacity, we develop a sparse formulation of Factorization Memory that updates only a subset of recurrent states at each step while preserving the strong performance of its dense counterpart. To our knowledge, this represents the first RNN architecture that successfully combines sparse memory activation with competitive performance across both short and long-context settings. This work provides a systematic empirical analysis of Factorization Memory in comparison to Transformer and Mamba-2 architectures. Transformer-based language modeling (Brown et al., 2020) has significantly advanced natural language processing (NLP) through multitask fine-tuning (Taori et al., 2023; Sanh et al., 2021). This paradigm shift has redefined NLP development, moving from training task-specific models to building general models capable of solving multiple tasks. A particularly challenging frontier is ultra-long-context understanding, where traditional models encounter fundamental limitations.


Group size effects and collective misalignment in LLM multi-agent systems

arXiv.org Artificial Intelligence

Multi-agent systems of large language models (LLMs) are rapidly expanding across domains, introducing dynamics not captured by single-agent evaluations. Yet, existing work has mostly contrasted the behavior of a single agent with that of a collective of fixed size, leaving open a central question: how does group size shape dynamics? Here, we move beyond this dichotomy and systematically explore outcomes across the full range of group sizes. We focus on multi-agent misalignment, building on recent evidence that interacting LLMs playing a simple coordination game can generate collective biases absent in individual models. First, we show that collective bias is a deeper phenomenon than previously assessed: interaction can amplify individual biases, introduce new ones, or override model-level preferences. Second, we demonstrate that group size affects the dynamics in a non-linear way, revealing model-dependent dynamical regimes. Finally, we develop a mean-field analytical approach and show that, above a critical population size, simulations converge to deterministic predictions that expose the basins of attraction of competing equilibria. These findings establish group size as a key driver of multi-agent dynamics and highlight the need to consider population-level effects when deploying LLM-based systems at scale.


Reactive Transformer (RxT) -- Stateful Real-Time Processing for Event-Driven Reactive Language Models

arXiv.org Artificial Intelligence

The Transformer architecture has become the de facto standard for Large Language Models (LLMs), demonstrating remarkable capabilities in language understanding and generation. However, its application in conversational AI is fundamentally constrained by its stateless nature and the quadratic computational complexity ($O(L^2)$) with respect to sequence length $L$. Current models emulate memory by reprocessing an ever-expanding conversation history with each turn, leading to prohibitive costs and latency in long dialogues. This paper introduces the Reactive Transformer (RxT), a novel architecture designed to overcome these limitations by shifting from a data-driven to an event-driven paradigm. RxT processes each conversational turn as a discrete event in real-time, maintaining context in an integrated, fixed-size Short-Term Memory (STM) system. The architecture features a distinct operational cycle where a generator-decoder produces a response based on the current query and the previous memory state, after which a memory-encoder and a dedicated Memory Attention network asynchronously update the STM with a representation of the complete interaction. This design fundamentally alters the scaling dynamics, reducing the total user-facing cost of a conversation from quadratic ($O(N^2 \cdot T)$) to linear ($O(N \cdot T)$) with respect to the number of interactions $N$. By decoupling response generation from memory updates, RxT achieves low latency, enabling truly real-time, stateful, and economically viable long-form conversations. We validated our architecture with a series of proof-of-concept experiments on synthetic data, demonstrating superior performance and constant-time inference latency compared to a baseline stateless model of comparable size.




OSI Stack Redesign for Quantum Networks: Requirements, Technologies, Challenges, and Future Directions

arXiv.org Artificial Intelligence

Quantum communication is poised to become a foundational element of next-generation networking, offering transformative capabilities in security, entanglement-based connectivity, and computational offloading. However, the classical OSI model-designed for deterministic and error-tolerant systems-cannot support quantum-specific phenomena such as coherence fragility, probabilistic entanglement, and the no-cloning theorem. This paper provides a comprehensive survey and proposes an architectural redesign of the OSI model for quantum networks in the context of 7G. We introduce a Quantum-Converged OSI stack by extending the classical model with Layer 0 (Quantum Substrate) and Layer 8 (Cognitive Intent), supporting entanglement, teleportation, and semantic orchestration via LLMs and QML. Each layer is redefined to incorporate quantum mechanisms such as enhanced MAC protocols, fidelity-aware routing, and twin-based applications. This survey consolidates over 150 research works from IEEE, ACM, MDPI, arXiv, and Web of Science (2018-2025), classifying them by OSI layer, enabling technologies such as QKD, QEC, PQC, and RIS, and use cases such as satellite QKD, UAV swarms, and quantum IoT. A taxonomy of cross-layer enablers-such as hybrid quantum-classical control, metadata-driven orchestration, and blockchain-integrated quantum trust-is provided, along with simulation tools including NetSquid, QuNetSim, and QuISP. We present several domain-specific applications, including quantum healthcare telemetry, entangled vehicular networks, and satellite mesh overlays. An evaluation framework is proposed based on entropy throughput, coherence latency, and entanglement fidelity. Key future directions include programmable quantum stacks, digital twins, and AI-defined QNet agents, laying the groundwork for a scalable, intelligent, and quantum-compliant OSI framework for 7G and beyond.